31 research outputs found

    LIMIX: genetic analysis of multiple traits

    Get PDF
    Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general. However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to exibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source implementation of LIMIX is freely available at: https://github.com/PMBio/limix

    Expression QTLs Mapping and Analysis: A Bayesian Perspective.

    Get PDF
    The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results

    It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

    No full text
    Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives

    A Lasso multi-marker mixed model for association mapping with population structure correction

    No full text
    Motivation: Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple Mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest seem to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariate associations is non-trivial and often compromised by limited statistical power. At the same time, confounding influences, such as population structure, cause spurious association signals that result in false-positive findings. Results: We propose linear mixed models LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters; it effectively controls for population structure and scales to genome-wide datasets. LMM-Lasso simultaneously discovers likely causal variants and allows for multi-marker-based phenotype prediction from genotype. We demonstrate the practical use of LMM-Lasso in genome-wide association studies in Arabidopsis thaliana and linkage mapping in mouse, where our method achieves significantly more accurate phenotype prediction for 91% of the considered phenotypes. At the same time, our model dissects the phenotypic variability into components that result from individual single nucleotide polymorphism effects and population structure. Enrichment of known candidate genes suggests that the individual associations retrieved by LMM-Lasso are likely to be genuine. Availability: Code available under http://webdav.tuebingen. mpg.de/u/karsten/Forschung/research.html

    It is all in the noise: efficient multi-task Gaussian process inference with structured residuals

    No full text
    Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives

    ccSVM: correcting Support Vector Machines for confounding factors in biological data classification

    No full text
    Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification

    Genetic architecture of nonadditive inheritance in Arabidopsis thaliana hybrids

    No full text
    The ubiquity of nonparental hybrid phenotypes, such as hybrid vigor and hybrid inferiority, has interested biologists for over a century and is of considerable agricultural importance. Although examples of both phenomena have been subject to intense investigation, no general model for the molecular basis of nonadditive genetic variance has emerged, and prediction of hybrid phenotypes from parental information continues to be a challenge. Here we explore the genetics of hybrid phenotype in 435 Arabidopsis thaliana individuals derived from intercrosses of 30 parents in a half diallel mating scheme. We find that nonadditive genetic effects are a major component of genetic variation in this population and that the genetic basis of hybrid phenotype can be mapped using genome-wide association (GWA) techniques. Significant loci together can explain as much as 20% of phenotypic variation in the surveyed population and include examples that have both classical dominant and overdominant effects. One candidate region inherited dominantly in the half diallel contains the gene for the MADS-box transcription factor AGAMOUS-LIKE 50 (AGL50), which we show directly to alter flowering time in the predicted manner. Our study not only illustrates the promise of GWA approaches to dissect the genetic architecture underpinning hybrid performance but also demonstrates the contribution of classical dominance to genetic variance

    Genomic Profiles of Diversification and Genotype–Phenotype Association in Island Nematode Lineages

    No full text
    Understanding how new species form requires investigation of evolutionary forces that cause phenotypic and genotypic changes among populations. However, the mechanisms underlying speciation vary and little is known about whether genomes diversify in the same ways in parallel at the incipient scale. We address this using the nematode, Pristionchus pacificus, which resides at an interesting point on the speciation continuum (distinct evolutionary lineages without reproductive isolation), and inhabits heterogeneous environments subject to divergent environmental pressures. Using whole genome re-sequencing of 264 strains, we estimate FST to identify outlier regions of extraordinary differentiation (∼1.725 Mb of the 172.5 Mb genome). We find evidence for shared divergent genomic regions occurring at a higher frequency than expected by chance among populations of the same evolutionary lineage. We use allele frequency spectra to find that, among lineages, 53% of divergent regions are consistent with adaptive selection, whereas 24% and 23% of such regions suggest background selection and restricted gene flow, respectively. In contrast, among populations from the same lineage, similar proportions (34-48%) of divergent regions correspond to adaptive selection and restricted gene flow, whereas 13-22% suggest background selection. Because speciation often involves phenotypic and genomic divergence, we also evaluate phenotypic variation, focusing on pH tolerance, which we find is diverging in a manner corresponding to environmental differences among populations. Taking a genome-wide association approach, we functionally validate a significant genotype-phenotype association for this trait. Our results are consistent with P. pacificus undergoing heterogeneous genotypic and phenotypic diversification related to both evolutionary and environmental processes
    corecore